Search CORE

116 research outputs found

CAGE Basic/Analysis Databases: the CAGE resource for comprehensive promoter analysis

Author: Carninci Piero
Fukuda Shiro
Hayashizaki Yoshihide
Kai Chikatoshi
Kasukawa Takeya
Katayama Shintaro
Kawai Jun
Kawaji Hideya
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

Cap-analysis gene expression (CAGE) Basic and Analysis Databases store an original resource produced by CAGE, which measures expression levels of transcription starting sites by sequencing large amounts of transcript 5′ ends, termed CAGE tags. Millions of human and mouse high-quality CAGE tags derived from different conditions in >20 tissues consisting of >250 RNA samples are essential for identification of novel promoters and promoter characterization in the aspect of expression profile. CAGE Basic Database is a primary database of the CAGE resource, RNA samples, CAGE libraries, CAGE clone and tag sequences and so on. CAGE Analysis Database stores promoter related information, such as counts of related transcripts, CpG islands and conserved genome region. It also provides expression profiles at base pair and promoter levels. Both databases are based on the same framework, CAGE tag starting sites, tag clusters for defining promoters and transcriptional units (TUs). Their associations and TU attributes are available to find promoters of interest. These databases were provided for Functional Annotation Of Mouse 3 (FANTOM3), an international collaboration research project focusing on expanding the transcriptome and subsequent analyses. Now access is free for all users through the World Wide Web at

CiteSeerX

Crossref

PubMed Central

SkewC : Identifying cells with skewed gene body coverage in single-cell RNA sequencing data

Author: Abugessaisa Imad
Cardon Melissa
Hasegawa Akira
Kasukawa Takeya
Katayama Shintaro
Kere Juha
Noguchi Shuhei
Suzuki Harukazu
Takahashi Masataka
Watanabe Kazuhide
Publication venue
Publication date: 15/01/2022
Field of study

The analysis and interpretation of single-cell RNA sequencing (scRNA-seq) experiments are compromised by the presence of poor-quality cells. For meaningful analyses, such poor-quality cells should be excluded as they introduce noise in the data. We introduce SkewC, a quality-assessment tool, to identify skewed cells in scRNA-seq experiments. The tool's methodology is based on the assessment of gene coverage for each cell, and its skewness as a quality measure; the gene body coverage is a unique characteristic for each protocol, and different protocols yield highly different coverage profiles. This tool is designed to avoid misclustering or false clusters by identifying, isolating, and removing cells with skewed gene body coverage profiles. SkewC is capable of processing any type of scRNA-seq dataset, regardless of the protocol. We envision SkewC as a distinctive QC method to be incorporated into scRNA-seq QC processing to preclude the possibility of scRNA-seq data misinterpretation.Peer reviewe

PubMed Central

Helsingin yliopiston digitaalinen arkisto

EpiFactors : a comprehensive database of human epigenetic factors and complexes

Author: Andreas Lennartsson
Finn Drabløs
Grigory Khimulya
Ilya E. Vorontsov
Ivan V. Kulakovskiy
Jaenisch
Pouda Panahandeh
Rezvan Ehsani
Takeya Kasukawa
Yulia A. Medvedeva
Zhu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

Altres ajuts: Russian Fund For Basic Research(RFFI)grant 14-04-0018 i grant 15-34-20423, Ake Olsson's foundation, Swedish Cancer foundation, Swedish Childhood cancer foundation, Dynasty Foundation Fellowship, RIKEN Omics Science Center, RIKEN Preventive Medicine and Diagnosis Innovation Program i RIKEN Center for Life Science Technologies.Abstract: Epigenetics refers to stable and long-term alterations of cellular traits that are not caused by changes in the DNA sequence per se. Rather, covalent modifications of DNA and histones affect gene expression and genome stability via proteins that recognize and act upon such modifications. Many enzymes that catalyse epigenetic modifications or are critical for enzymatic complexes have been discovered, and this is encouraging investigators to study the role of these proteins in diverse normal and pathological processes. Rapidly growing knowledge in the area has resulted in the need for a resource that compiles, organizes and presents curated information to the researchers in an easily accessible and user-friendly form. Here we present EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets and products. EpiFactors contains information on 815 proteins, including 95 histones and protamines. For 789 of these genes, we include expressions values across several samples, in particular a collection of 458 human primary cell samples (for approximately 200 cell types, in many cases from three individual donors), covering most mammalian cell steady states, 255 different cancer cell lines (representing approximately 150 cancer subtypes) and 134 human postmortem tissues. Expression values were obtained by the FANTOM5 consortium using Cap Analysis of Gene Expression technique. EpiFactors also contains information on 69 protein complexes that are involved in epigenetic regulation. The resource is practical for a wide range of users, including biologists, pharmacologists and clinicians

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

PubMed Central

Diposit Digital de Documents de la UAB

NORA - Norwegian Open Research Archives

Differential Use of Signal Peptides and Membrane Domains Is a Common Occurrence in the Protein Output of Transcriptional Units

Author: Bill Pavan
Chikatoshi Kai
Fasheng Zhang
Francis Clark
J. Lynn Fink
John Hancock
Judith Blake
Jun Kawai
Kelly A Hanson
Lisa Stubbs
Melissa J Davis
Piero Carninci
Rohan D Teasdale
Takeya Kasukawa
vonHeijne G
Yoshihide Hayashizaki
Publication venue: Public Library of Science
Publication date: 01/01/2006
Field of study

Membrane organization describes the orientation of a protein with respect to the membrane and can be determined by the presence, or absence, and organization within the protein sequence of two features: endoplasmic reticulum signal peptides and alpha-helical transmembrane domains. These features allow protein sequences to be classified into one of five membrane organization categories: soluble intracellular proteins, soluble secreted proteins, type I membrane proteins, type II membrane proteins, and multi-spanning membrane proteins. Generation of protein isoforms with variable membrane organizations can change a protein's subcellular localization or association with the membrane. Application of MemO, a membrane organization annotation pipeline, to the FANTOM3 Isoform Protein Sequence mouse protein set revealed that within the 8,032 transcriptional units (TUs) with multiple protein isoforms, 573 had variation in their use of signal peptides, 1,527 had variation in their use of transmembrane domains, and 615 generated protein isoforms from distinct membrane organization classes. The mechanisms underlying these transcript variations were analyzed. While TUs were identified encoding all pairwise combinations of membrane organization categories, the most common was conversion of membrane proteins to soluble proteins. Observed within our high-confidence set were 156 TUs predicted to generate both extracellular soluble and membrane proteins, and 217 TUs generating both intracellular soluble and membrane proteins. The differential use of endoplasmic reticulum signal peptides and transmembrane domains is a common occurrence within the variable protein output of TUs. The generation of protein isoforms that are targeted to multiple subcellular locations represents a major functional consequence of transcript variation within the mouse transcriptome

Public Library of Science (PLOS)

Crossref

Adelaide Research & Scholarship

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

University of Queensland eSpace

Alternate transcription of the Toll-like receptor signaling cascade

Author: Carninci Piero
Chalk Alistair M
Faulkner Geoffrey
Forrest Alistair
Grimmond Sean M
Hayashizaki Yoshihide
Himes S Roy
Hume David A
Kai Chikatoshi
Kasukawa Takeya
Katayama Shintaro
Kawai Jun
Kawaji Hideya
Lo Sandra
Schroder Kate
Taylor Darrin
Waddell Nic
Wells Christine A
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Alternate splicing of key signaling molecules in the Toll-like receptor (Tlr) cascade has been shown to dramatically alter the signaling capacity of inflammatory cells, but it is not known how common this mechanism is. We provide transcriptional evidence of widespread alternate splicing in the Toll-like receptor signaling pathway, derived from a systematic analysis of the FANTOM3 mouse data set. Functional annotation of variant proteins was assessed in light of inflammatory signaling in mouse primary macrophages, and the expression of each variant transcript was assessed by splicing arrays. RESULTS: A total of 256 variant transcripts were identified, including novel variants of Tlr4, Ticam1, Tollip, Rac1, Irak1, 2 and 4, Mapk14/p38, Atf2 and Stat1. The expression of variant transcripts was assessed using custom-designed splicing arrays. We functionally tested the expression of Tlr4 transcripts under a range of cytokine conditions via northern and quantitative real-time polymerase chain reaction. The effects of variant Mapk14/p38 protein expression on macrophage survival were demonstrated. CONCLUSION: Members of the Toll-like receptor signaling pathway are highly alternatively spliced, producing a large number of novel proteins with the potential to functionally alter inflammatory outcomes. These variants are expressed in primary mouse macrophages in response to inflammatory mediators such as interferon-γ and lipopolysaccharide. Our data suggest a surprisingly common role for variant proteins in diversification/repression of inflammatory signaling

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

University of Melbourne Institutional Repository

Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

Author: Abugessaisa Imad
Aitken Stuart
Bessière Chloé
Frith Martin C.
Grapotte Mathys
Hasegawa Akira
Hayashizaki Yoshihide
Itoh Masayoshi
Kasukawa Takeya
Kojima-Ishiyama Miki
Menichelli Christophe
Murata Mitsuyoshi
Nishiyori-Sueki Hiromi
Noguchi Shuhei
Noma Shohei
Ramilowski Jordan A.
Saraswat Manu
Severin Jessica
Suzuki Harukazu
Tagami Michihira
Publication venue: FIU Digital Commons
Publication date: 01/12/2021
Field of study

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism

DigitalCommons@Florida International University

Comparative transcriptomics of primary cells in vertebrates

Author: Abugessaisa Imad
Agrawal Saumya
Alam Tanvir
Andersson Robin
Arner Erik
Carninci Piero
de Hoon Michiel J. L.
Forrest Alistair R. R.
Hasegawa Akira
Hayashizaki Yoshihide
Ishizu Yuri
Itoh Masayoshi
Kasukawa Takeya
Kawaji Hideya
Khachigian Levon M.
Lassmann Timo
Lizio Marina
Marchionni Luigi
Noma Shohei
Ramilowski Jordan A.
Severin Jessica
Sheng Guojun
Tarui Hiroshi
Taylor Martin S.
Young Robert S.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2020
Field of study

Copenhagen University Research Information System

Edinburgh Research Explorer

Author Correction: Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

Author: Bessière Chloé
Bréhélin Laurent
Carninci Piero
Chatelain Clément
de Hoon Michiel J. L.
Fantom consortium
Frith Martin C.
Grapotte Mathys
Hasegawa Akira
Hayashizaki Yoshihide
Itoh Masayoshi
Kasukawa Takeya
Kojima-Ishiyama Miki
Lecellier Charles-Henri
Menichelli Christophe
Murata Mitsuyoshi
Nishiyori-Sueki Hiromi
Noguchi Shuhei
Noma Shohei
Ramilowski Jordan A.
Saraswat Manu
Severin Jessica
Suzuki Harukazu
Tagami Michihira
Wasserman Wyeth W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

edoc

PubMed Central

Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs

Author: Aturaliya Rajith N
Batalov Serge
Beisel Kirk W
Bult Carol J
Carninci Piero
Engström Pär G
Fletcher Colin F
Forrest Alistair R. R
Frith Martin
Furuno Masaaki
Gough Julian
Hayashizaki Yoshihide
Hill David
Hume David A
Itoh Masayoshi
Kai Chikatoshi
Kanamori-Katayama Mutsumi
Kasukawa Takeya
Katayama Shintaro
Katoh Masaru
Kawai Jun
Kawashima Tsugumi
Lenhard Boris
Maeda Norihiro
Oyama Rieko
Quackenbush John
Ravasi Timothy
Ring Brian Z
Shibata Kazuhiro
Sugiura Koji
Takenaka Yoichi
Teasdale Rohan D
Wells Christine A
Zhu Yunxia
Publication venue: Public Library of Science
Publication date: 01/01/2006
Field of study

The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

The Novartis Repository

University of Melbourne Institutional Repository

Explore Bristol Research

University of Queensland eSpace